{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# General-purpose prediction with DNNs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Multi-layer deep networks are powerful predictors and often outperform classical models like kernel-SVMs and gradient-boosted trees. Here we'll apply a simple multi-layer network to classification of Higgs Boson data.  \n",
    "\n",
    "Let's load BIDMat/BIDMach"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import BIDMat.{CMat,CSMat,DMat,Dict,IDict,Image,FMat,FND,GDMat,GMat,GIMat,GSDMat,GSMat,HMat,IMat,Mat,SMat,SBMat,SDMat}\n",
    "import BIDMat.MatFunctions._\n",
    "import BIDMat.SciFunctions._\n",
    "import BIDMat.Solvers._\n",
    "import BIDMat.JPlotting._\n",
    "import BIDMach.Learner\n",
    "import BIDMach.models.{FM,GLM,KMeans,KMeansw,ICA,LDA,LDAgibbs,Model,NMF,RandomForest,SFA,SVD}\n",
    "import BIDMach.networks.{Net}\n",
    "import BIDMach.datasources.{DataSource,MatSource,FileSource,SFileSource}\n",
    "import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}\n",
    "import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}\n",
    "import BIDMach.causal.{IPTW}\n",
    "\n",
    "Mat.checkMKL\n",
    "Mat.checkCUDA\n",
    "Mat.setInline\n",
    "if (Mat.hasCUDA > 0) GPUmem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And define the root directory for this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "val dir = \"/code/BIDMach/data/uci/Higgs/parts/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constructing a deep network Learner"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"Net\" class is the parent class for Deep networks. By defining a learner, we also configure a datasource, an optimization method, and possibly a regularizer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "val (mm, opts) = Net.learner(dir+\"data%03d.fmat.lz4\", dir+\"label%03d.fmat.lz4\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to define the network to run. First we set some options:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "opts.hasBias = true;                    // Include additive bias in linear layers\n",
    "opts.links = iones(1,1);                // The link functions specify output loss, 1= logistic\n",
    "opts.nweight = 1e-4f                    // weight for normalization layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we define the network itself. We use the function \"dnodes3\" which builds stack of 3 layers with linear, non-linear and normalization layers. The non-linearity is configurable. The arguments to the function are:\n",
    "* depth:Int the number of layers,\n",
    "* width:Int the number of units in the first hidden layer (not the input layer which is set by the data source)\n",
    "* taper:Float the decrease (multiplicative) in width of each hidden layer from the first\n",
    "* ntargs:Int how many targets to predict\n",
    "* opts:Opts the options above\n",
    "* nonlin:Int the type of non-linear layer, 1=tanh, 2=sigmoid, 3=rectifying, 4=softplus"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "opts.nodeset = Net.dnodes3(12, 500, 0.6f, 1, opts, 2);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the source for dnodes3. It creates a \"nodeset\" or flow graph for the network. The nodeSet is \"nodes\" and can be access like an array. nodes(i) is set to a node whose input is nodes(i-1) etc. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<pre>\n",
    "  def dnodes3(depth0:Int, width:Int, taper:Float, ntargs:Int, opts:Opts, nonlin:Int = 1):NodeSet = {\n",
    "    val depth = (depth0/3)*3;              // Round up to an odd number of nodes \n",
    "    val nodes = new NodeSet(depth);\n",
    "    var w = width;\n",
    "    nodes(0) = new InputNode;\n",
    "    for (i &lt;- 1 until depth-2) {\n",
    "    \tif (i % 3 == 1) {\n",
    "    \t\tnodes(i) = new LinNode{inputs(0)=nodes(i-1); outdim=w; hasBias=opts.hasBias; aopts=opts.aopts};\n",
    "    \t\tw = (taper*w).toInt;\n",
    "    \t} else if (i % 3 == 2) {\n",
    "    \t  nonlin match {\n",
    "    \t    case 1 => nodes(i) = new TanhNode{inputs(0)=nodes(i-1)};\n",
    "    \t    case 2 => nodes(i) = new SigmoidNode{inputs(0)=nodes(i-1)};\n",
    "    \t    case 3 => nodes(i) = new RectNode{inputs(0)=nodes(i-1)};\n",
    "    \t    case 4 => nodes(i) = new SoftplusNode{inputs(0)=nodes(i-1)};\n",
    "    \t  }\n",
    "    \t} else {\n",
    "    \t\tnodes(i) = new NormNode{inputs(0)=nodes(i-1); targetNorm=opts.targetNorm; weight=opts.nweight};\n",
    "    \t}\n",
    "    }\n",
    "    nodes(depth-2) = new LinNode{inputs(0)=nodes(depth-3); outdim=ntargs; hasBias=opts.hasBias; aopts=opts.aopts};\n",
    "    nodes(depth-1) = new GLMNode{inputs(0)=nodes(depth-2); links=opts.links};\n",
    "    nodes;\n",
    "  }\n",
    "  </pre>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tuning Options"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here follow some tuning options "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "opts.nend = 10                         // The last file number in the datasource\n",
    "opts.npasses = 5                       // How many passes to make over the data \n",
    "opts.batchSize = 200                  // The minibatch size\n",
    "opts.evalStep = 511                    // Count of minibatch between eval steps\n",
    "\n",
    "opts.lrate = 0.01f;                    // Learning rate\n",
    "opts.texp = 0.4f;                      // Time exponent for ADAGRAD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You invoke the learner the same way as before. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "mm.train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets extract the model and use it to predict labels on a held-out sample of data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "val model = mm.model.asInstanceOf[Net]\n",
    "\n",
    "val ta = loadFMat(dir + \"data%03d.fmat.lz4\" format 10);\n",
    "val tc = loadFMat(dir + \"label%03d.fmat.lz4\" format 10);\n",
    "\n",
    "val (nn,nopts) = Net.predictor(model, ta);\n",
    "nopts.batchSize=10000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's run the predictor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nn.predict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To evaluate, we extract the predictions as a floating matrix, and then compute a ROC curve with them. The mean of this curve is the AUC (Area Under the Curve)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "val pc = FMat(nn.preds(0))\n",
    "val rc = roc(pc, tc, 1-tc, 1000);\n",
    "mean(rc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot(rc)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Not Bad!: this net gives competitive performance on the Kaggle challenge for Higgs Boson classification. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tuning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This net an be optimized in a variety of ways. Try adding an extra block of layers (your should increment the net depth by 3) and re=running. You may need to restart the notebook."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "scala",
   "version": "2.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
